Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] Skip tokenization support for throughput benchmark #12712

Merged

Conversation

maleksan85
Copy link
Contributor

@maleksan85 maleksan85 commented Feb 3, 2025

Fixing support for vLLM mode to run without tokenizer in throughput benchmark.

Repeating accidently corrupted: #12489

Copy link

github-actions bot commented Feb 3, 2025

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@maleksan85 maleksan85 changed the title adding support for skip tokenizer benchmarking [BUGFIX] Skip tokenization support for throughtput benchmark Feb 3, 2025
@maleksan85
Copy link
Contributor Author

maleksan85 commented Feb 3, 2025

cc @comaniac copy of #12489 not yet tried on v1, as there was issues with v1 last week.

@comaniac comaniac self-assigned this Feb 3, 2025
@comaniac
Copy link
Collaborator

comaniac commented Feb 3, 2025

Thanks. Will approve if it works for v1.

@maleksan85
Copy link
Contributor Author

maleksan85 commented Feb 28, 2025

Thanks. Will approve if it works for v1.

sorry for delay, just got spare time to test it and V1 working :) So both works well and show throughput improvement.

Copy link
Collaborator

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On V0, I see a lot of warning spam like the following when running with --skip_tokenizer_init. Have you noticed this as well?

WARNING 03-04 20:39:37 [preprocess.py:59] Using None for EOS token id because tokenizer is not initialized

Copy link
Collaborator

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please merge in latest main

Copy link

mergify bot commented Mar 4, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @maleksan85.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 4, 2025
Aleksandr Malyshev added 2 commits March 4, 2025 22:49
Signed-off-by: Aleksandr Malyshev <[email protected]>
Signed-off-by: Aleksandr Malyshev <[email protected]>
@mergify mergify bot removed the needs-rebase label Mar 4, 2025
@tlrmchlsmth tlrmchlsmth changed the title [BUGFIX] Skip tokenization support for throughtput benchmark [BUGFIX] Skip tokenization support for throughput benchmark Mar 5, 2025
Signed-off-by: Aleksandr Malyshev <[email protected]>
@njhill
Copy link
Member

njhill commented Mar 6, 2025

I don't think we support the skip_tokenizer_init option yet in V1 (we should make sure there's a check on the V1 path that raises an appropriate error).

@maleksan85 WDYT about handling this in the benchmarks without using that option - in particular having explicit skip_tokenization and skip_detokenization benchmark options where the former controls whether token_ids are used instead of text prompts (per your change in this PR), and the latter is handled by #11697?

This should work in V1 now that #14224 is merged.

@maleksan85
Copy link
Contributor Author

maleksan85 commented Mar 6, 2025

I don't think we support the skip_tokenizer_init option yet in V1 (we should make sure there's a check on the V1 path that raises an appropriate error).

@maleksan85 WDYT about handling this in the benchmarks without using that option - in particular having explicit skip_tokenization and skip_detokenization benchmark options where the former controls whether token_ids are used instead of text prompts (per your change in this PR), and the latter is handled by #11697?

This should work in V1 now that #14224 is merged.

This is a BUGFIX PR that restores functionality to exclude tokenize\detokenize from model inference. Tried on V1, haven't seen any errors. If you want extra flag, that vllm community was against of, when I created this PR, sure. However personally I don't see a reason for new flag(s).

@Alexei-V-Ivanov-AMD
Copy link
Contributor

/ready

@maleksan85
Copy link
Contributor Author

@comaniac would you be able to help to land this PR?

Copy link
Collaborator

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

@maleksan85
Copy link
Contributor Author

Otherwise LGTM

great, please mark it ready then if no other concerns.

@comaniac comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 6, 2025
@comaniac comaniac enabled auto-merge (squash) March 6, 2025 23:18
@vllm-bot vllm-bot merged commit 0ca3b8e into vllm-project:main Mar 7, 2025
61 of 62 checks passed
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we support the skip_tokenizer_init option yet in V1 (we should make sure there's a check on the V1 path that raises an appropriate error).
@maleksan85 WDYT about handling this in the benchmarks without using that option - in particular having explicit skip_tokenization and skip_detokenization benchmark options where the former controls whether token_ids are used instead of text prompts (per your change in this PR), and the latter is handled by #11697?
This should work in V1 now that #14224 is merged.

This is a BUGFIX PR that restores functionality to exclude tokenize\detokenize from model inference. Tried on V1, haven't seen any errors. If you want extra flag, that vllm community was against of, when I created this PR, sure. However personally I don't see a reason for new flag(s).

@maleksan85 apologies, LGTM too. Would it make sense to apply a similar change to the other benchmark scripts benchmark_latency.py, benchmark_prefix_caching.py, benchmark_prioritization.py?

@maleksan85
Copy link
Contributor Author

@maleksan85 apologies, LGTM too. Would it make sense to apply a similar change to the other benchmark scripts benchmark_latency.py, benchmark_prefix_caching.py, benchmark_prioritization.py?

as to what I understand benrmark_latency.py only sends ids instead of tokens. So like inherently supports skip_tokenizer_init, I mean the change I did for benchmark throughput.

dummy_prompt_token_ids = np.random.randint(10000,

For the rest of benchmarks it is up to those functionalities developers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
force-merge ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants